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Appendix 


INTRODUCTION 


The  goal  of  this  innovator  award  is  to  continue  to  develop  and  apply  RNAi-based 
screening  methods  to  discover  new  paths  towards  breast  cancer  treatment.  This  project  has 
three  aims.  The  first  is  to  perform  genome-wide  RNAi  screens  on  tumor-derived  cell  line  models 
to  identify  tumor-specific  vulnerabilities  and  understand  the  basis  of  therapy  resistance  to 
commonly  used  targeted  therapies.  Second  is  to  probe  the  roles  of  breast  cancer  stem  cells 
with  emphasis  on  DNA  methylation  profiling.  The  third  is  to  apply  novel,  focal  re-sequencing 
methods  developed  in  the  laboratory  to  uncover  genomic  rearrangements  that  contribute  to  the 
susceptibility  of  breast  cancer. 

BODY 

Fourth-generation  (V4)RNAi  resources 

During  this  past  year,  we  have  completed  the  development  of  our  shRNA  prediction 
algorithm  called  shERWOOD.  Prior  algorithms  for  predicting  effective  RNAi  reagents  have 
suffered  from  two  drawbacks  for  designing  shRNAs.  First,  they  have  been  derived  from  a 
relatively  small  number  of  data  points.  Even  the  best  algorithm  had  only  used  approximately 
2500  measurements  of  knockdown,  such  that  there  were  an  extremely  small  number  of  effective 
sequences  present  for  the  algorithm  to  learn  their  properties.  In  contrast,  approximately  250,000 
measurements  of  shRNA  efficacy  were  used  to  train  the  shERWOOD  algorithm.  Second,  until 
now,  existing  algorithms  predict  siRNAs,  but  not  shRNAs.  shRNAs  have  more  sequence 
constraints  than  siRNAs  due  to  their  requirement  for  efficient  processing  by  the  cell’s  RNAi 
machinery.  All  effective  RNAi  prediction  tools  tend  to  choose  sequences  that  begin  with  a  U. 
This  is  thought  to  have  a  structural  basis  in  the  interaction  between  the  RNA  and  Argonaute,  the 
core  of  the  RNAi  effector  complex.  The  5’  residue  of  the  RNA  has  been  shown  to  reside  in  a 
binding  pocket  which  favors  interaction  with  U.  However,  when  the  small  RNA  interacts  with 
Argonaute,  its  5’  end  is  not  available  for  pairing  with  the  target  RNA.  Even  though  the  5’  U 
contributes  to  RISC  binding,  it  is  irrelevant  to  target  recognition.  Thus  we  tested  the  idea  that  we 
could  expand  available  sequence  space  by  predicting  on  every  position  in  the  transcriptome  and 
changing  the  small  RNA  guide  that  would  pair  to  that  site  so  that  it  began  with  a  5’  U.  This  is 
henceforth  referred  to  as  the  “1U-strategy”.  Simulation  of  this  1U-strategy  compared  to  the 
same  predictions  lacking  1U  produced  higher  potency  scores  for  1U  constructs.  Importantly, 
this  improvement  also  enabled  predictions  on  small  genes  with  a  limited  number  of  potential 
target  sequences. 

To  experimentally  validate  the  1U-strategy,  we  performed  our  ‘sensor’  assay  using  a 
library  which  targeted  a  set  of  2000  genes  (“druggable”)  with  the  top  15  predicted  shRNA  per 
gene.  The  sensor  constructs  contained  target  sites  with  the  endogenous  base,  while  the 
shRNAs  were  either  with  or  without  the  1U  conversion.  Our  sensor  assay  screens  for  highly 
potent  shRNAs  in  a  highly  parallel  and  high-throughput  fashion.  Distribution  of  the  data 
indicated  that  approximately  50%  of  all  the  shRNAs  were  strong  or  very  strong  (knockdown 
efficiency  >75%).  When  the  native  and  artificial  1U  shRNA  data  were  plotted  with  their  score 
distributions,  we  observed  a  significant  reduction  in  efficacy  of  the  non-native  1 U  shRNAs.  We 
therefore  stratified  the  ID  shRNAs  based  on  their  endogenous  5’  nucleotide  and  found  that  only 
a  subset  of  shRNAs  performed  well  (endogenous  ID  shRNAs)  when  a  lU-switch  was  made. 
Using  this  dataset,  a  new  shERWOOD  algorithm  module  was  developed  that  could  select  for 
the  strongest  endogenous  1U  shRNAs  and  identify  which  endogenous  1C,  1G,  and  1A  shRNAs 
are  likely  to  yield  potent  lU-coverted  molecules. 

Since  the  previous  update,  we  completed  the  construction  of  the  human  V4  library, 
which  now  has  88,275  shRNAs  targeting  18,651  genes  (with  16,681  genes  represented  by  at 
least  three  shRNAs,  1,518  genes  with  at  least  two  shRNAs,  and  452  genes  with  one  shRNA  per 


4 


gene).  The  mouse  V4  library  currently  has  58,113  shRNAs  in  total,  targeting  18,769  genes  (with 
1 1 ,936  genes  with  three  or  more  shRNAs). 

RNAi  screening  of  breast  cancer  cell  line  models  for  new  therapeutic  targets 

Approximateiy  20%  to  25%  of  invasive  breast  cancers  exhibit  overexpression  of  the 
human  epidermal  growth  factor  receptor  (HER2)  tyrosine  kinase  receptor.  As  elevated  HER2 
levels  are  associated  with  reduced  disease-free  and  overall  survival  in  metastatic  breast  cancer, 
therapeutic  strategies  have  been  developed  to  target  this  oncoprotein.  Trastuzumab,  a 
recombinant  humanized  monoclonal  antibody  directed  against  an  extracellular  region  of  HER2, 
was  the  first  HER2-targeted  therapy  approved  for  treatment  of  HER2-overexpressing  metastatic 
breast  cancer.  This  drug  is  active  as  a  single  agent  and  in  combination  with  adjuvant 
chemotherapy  (either  in  sequence  or  in  combination)  in  HER2-positive  breast  cancers. 
However,  the  objective  response  rates  to  trastuzumab  mono-therapy  were  low  (12%  to  35%), 
and  for  a  median  duration  of  nine  months,  suggesting  a  majority  of  HER2-overexpressing 
tumors  demonstrated  de  novo  resistance.  Phase  III  trials  revealed  that  the  combination  of 
trastuzumab  and  paclitaxel  or  docetaxel  could  increase  response  rates,  time  to  disease 
progression,  and  overall  survival  compared  to  trastuzumab  mono-therapy.  For  HER2-positive 
patients  who  had  not  received  prior  chemotherapy,  the  median  time  to  progression  in  response 
to  trastuzumab  as  single-agent  was  less  than  five  months.  In  patients  who  received  trastuzumab 
and  chemotherapy,  the  median  time  to  progression  was  7.5  months.  Thus,  the  majority  of 
patients  who  achieve  an  initial  response  to  trastuzumab-based  regimens  develop  resistance 
within  one  year.  Elucidating  the  molecular  mechanisms  underlying  acquired  resistance  to 
trastuzumab  is  essential  for  improving  the  survival  of  HER2-positive,  metastatic  breast  cancer 
patients. 

Our  goal  is  to  uncover  mechanisms  underlying  the  basis  of  acquired  trastuzumab 
resistance  and  to  find  genes  that  can  be  targeted  pharmacologically  to  reverse  drug  resistance. 
We  have  obtained  several  tumor-derived  cell  line  models  of  acquired  trastuzumab  resistance 
from  Dr.  Dennis  Slamon  (UCLA).  We  performed  genome-wide  shRNA  screens  using  our  3'^'^- 
generation  human  shRNA  library  in  the  presence  and  absence  of  trastuzumab  to  uncover 
shRNAs  that  would  sensitize  the  resistant  cell  lines  to  the  drug. 


Status  of  genome-wide  RNAi  screens  on  HER2-positive,  trastuzumab  resistance  (acquired) 
models: 


Cell  line 

SKBR3  (drug  sensitive) 
SKBR3  (drug  sensitive) 
SK-TR  (drug  resistant) 
SK-TR  (drug  resistant) 
EFM192A  (drug  sensitive) 
EFM192A  (drug  sensitive) 
EFM-TR  (drug  resistant) 
EFM-TR  (drug  resistant) 


Screening  condition 
No  drug 

Trastuzumab  15ug/ml 
No  drug 

Trastuzumab  15ug/ml 
No  drug 

Trastuzumab  15ug/ml 
No  drug 

Trastuzumab  15ug/ml 


Status 

Screen 

Screen 

Screen 

Screen 

Screen 

Screen 

Screen 

Screen 


completed. 

completed. 

completed. 

completed. 

completed. 

completed. 

completed. 

completed. 


Samples 

Samples 

Samples 

Samples 

Samples 

Samples 

Samples 

Samples 


sequenced. 

sequenced. 

sequenced. 

sequenced. 

sequenced. 

sequenced. 

sequencing 

sequencing 


Data  analyzed. 
Data  analyzed. 
Data  analyzed. 
Data  analyzed. 
Data  analyzed. 
Data  analyzed, 
in  progress, 
in  progress. 


To  identify  genes  conferring  secondary  (acquired)  resistance  to  trastuzumab  from  the 
datasets,  we  set  a  cutoff  of  FDR<0.25  and  fiitered  for  genes  that  depieted  oniy  upon 
trastuzumab  treatment  in  the  drug  resistant  line,  SKTR,  but  not  in  either  of  the  two  drug 
sensitive  iines,  SKBR3  and  EFM192A.  This  produced  a  iist  of  25  genes  (Figure  1:  Heatmap  of 
the  25  genes),  which  inciuded  those  from  PI3K-mTOR  signaiing  (PI4K2A,  Raptor,  Insuiin 
receptor,  and  EIF4A),  RNA  processing  (PRPF8,  U2AF1,  and  LSM6),  mitotic  checkpoint 
(BUB1B),  and  genes  of  relatively  less  well-known  function.  Identification  of  the  insulin  receptor 
and  members  of  the  PI3K-mTOR  signaling  pathway  fulfills  our  expectation  of  finding  these 
genes  in  this  screen  since  they  are  known  to  be  functionally  associated  with  trastuzumab 
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resistance.  However,  TNFSF1 1/RANKL  (ligand  of  the  receptor  activator  of  nuclear  factor  kappa 
B),  a  gene  of  significant  relevance  to  breast  cancer,  was  also  found  In  the  screen  as  one  of  two 
most  highly  depleted  hits. 


Figure  I.Heatmap  of 
shRNAs  (FDR<0.25)  for 
gene  targets  that 
sensitize  drug  resistant 
(SKTR)  and  not  drug 
sensitive  cell  line  models 
(SKBR3  and  EFM192A). 
These  are  potentially 
novel  candidates  for 
trastuzumab 
combination  therapy. 
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Figure  2.  Competition  assays  to  validate  selected  hits.  Cell  line:  SKTR  (Drug  resistant),  Drug  =  Trastuzumab. 
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Figure  3.  Competition  assays  to  validate  selected  hits.  Cell  line:  SKBR3  (Drug  Sensitive),  Drug  =  Trastuzumab. 
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We  validated  17  of  the  25  targets  (Figure  1)  by  using  competition  assays  and  the  results 
demonstrated  that  these  target  genes  specifically  sensitize  the  trastuzumab  resistant  cells 
(SKTR)  when  silenced  by  RNAi  (Figure  2  and  3).  In  the  competition  assay,  shRNA-expressing 
cells  (GFP+)  are  mixed  with  an  equal  proportion  of  parental  cells  for  each  cell  knockdown  line. 
Each  cell  line  mixture  is  plated  in  triplicate  and  either  untreated  or  treated  with  drug.  The 
percentage  of  GFP+  cells  remaining  is  then  tracked  over  time.  We  prioritized  our  effort  by 
selection  of  RANKL/TNFSF1 1  as  the  first  target  for  further  validation  due  to  the  existence  of  a 
clinically  approved  inhibitor  called  Denosumab.  Xenografts  of  these  cell  lines  are  currently  being 
tested  to  validate  whether  RANKL  is  a  target  for  transtuzumab  sensitization  in  vivo.  Ultimately, 
our  goal  is  to  test  whether  this  combination  approach  will  have  an  impact  on  reducing 
tumorigenicity  in  human  breast  cancer  cells  from  patients  that  are  resistant  to  trastuzumab 
therapy.  In  addition,  we  will  continue  to  pursue  the  remaining  targets  in  further  validation 
studies. 

Denosumab  is  a  humanized  monoclonal  antibody  designed  to  inhibit  RANKL  for  treating 
various  bone  related  conditions.  This  drug  was  approved  for  use  to  treat  giant  cell  tumor  of  the 
bone,  for  breast  cancer  patients  on  adjuvant  aromatase  inhibitor  therapy  to  increase  bone  mass, 
for  postmenopausal  women  with  risk  of  osteoporosis,  and  for  the  prevention  of  skeletal-related 
events  in  patients  with  bone  metastases  from  solid  tumors.  Denosumab  is  highly  specific  as  it 
binds  human  RANKL,  but  not  murine  RANKL,  human  TRAIL,  or  other  human  TNF  family 
members.  RANKL  and  its  receptor  RANK  are  best  known  for  their  essential  function  in  bone 
remodeling  and  bone-related  pathologies  including  osteoporosis  and  arthritis.  The  dysregulation 
of  the  RANKL-RANK  system  is  the  major  cause  of  osteoporosis  in  post-menopausal  women. 
Appropriate  RANKL  signaling  is  also  required  for  the  formation  of  a  lactating  mammary  gland, 
and  both  RANKL  and  RANK  are  expressed  under  the  control  of  progesterone,  prolactin,  and  the 
parathyroid  hormone  protein-related  peptide  (PTHrP).  Recent  data  also  implicate  RANKL  and 
RANK  in  the  control  of  metastasis  of  breast  cancer  cells  to  the  bone  and  sex  hormone-driven 
primary  mammary  cancer.  Unfortunately,  synthetic  progesterone  derivatives  (progestins),  such 
as  medroxyprogesterone  acetate  (MPA),  used  in  hormone  replacement  therapy  and 
contraceptives  have  been  demonstrated  to  induce  the  RANKL-RANK  system,  providing  growth 
and  survival  advantage  to  damaged  mammary  epithelium,  a  requisite  for  tumor  initiation.  In 
addition,  recent  evidence  links  Her2  expression  to  RANKL-RANK  signaling.  Her2  expression  is 
increased  in  luminal  tumor  cells  grown  in  mouse  bone  xenografts,  as  well  as  in  bone 
metastases  from  patients  with  breast  cancer  as  compared  to  matched  primary  tumors.  The 
increase  in  Her2  protein  levels  was  not  due  to  gene  amplification,  but  rather  was  mediated  by 
RANKL  in  the  bone  environment. 

Epigenetic  characterization  of  the  mammary  epitheiiai  iineage 

We  have  undertaken  a  full  epigenetic  characterization  of  the  mouse  mammary  epithelial  lineage 
from  nulliparous  and  parous  mice.  Our  goal  is  to  understand  how  DNA  methylation  patterns 
change  as  cells  differentiate  along  this  lineage  and  understand  what  discriminates  stem  cells 
from  their  mature  progeny.  This  work  has  subsequently  resulted  in  a  publication^  of  which  a 
reprint  is  attached  to  this  report.  Please  refer  to  this  article  for  detailed  descriptions  of  specific 
aspects  of  the  research. 


KEY  RESEARCH  ACCOMPLISHMENTS 

•  Development  of  a  shRNA  prediction  algorithm  (shERWOOD)  that  essentially  predicts  the 
results  of  functional,  sensor  testing  of  shRNAs  in  siiico. 

•  Completed  construction  of  a  sequence-verified,  human,  4‘'^-generation  shRNA  resource. 
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•  Discovery  of  potential  target  genes  to  sensitize  Her2-positive,  trastuzumab-resistant  cell 
models  of  breast  cancer. 

•  Defining  the  moiecuiar  hierarchy  of  mammary  differentiation  yieided  refined  markers  of 
mammary  stem  celis. 


REPORTABLE  OUTCOMES 

•  Developed  a  shRNA-specific  prediction  algorithm,  calied  shERWOOD. 

•  Compieted  construction  of  a  4‘'^-generation,  human  shRNA  library  consisting  of  88,275 
shRNAs  and  targeting  18,651  genes  (with  16,681  genes  represented  by  at  ieast  three 
shRNAs). 

•  Constructed  and  sequence-verifed  58,113  shRNAs  (targeting  18,769  genes)  of  the  4*'^- 
generation,  mouse  shRNA  iibrary. 

•  Both  human  and  mouse  4‘'^-generation  shRNA  resources  are  avaiiabie  to  the  scientific 
community  through  Transomic  Technoiogies  Inc.  (Huntsville,  Alabama). 

•  Published  manuscript: 

“Molecular  hierarchy  of  mammary  differentiation  yields  refined  markers  of  mammary 
stem  cells” 

Camila  O.  dos  Santos,  Clare  Rebbeck,  Elena  Rozhkova,  Amy  Valentine,  Abigail 
Samuels,  Lolahon  R.  Kadiri,  Pavel  Osten,  Elena  Y.  Harris,  Philip  J.  Uren,  Andrew  D. 
Smith,  and  Gregory  J.  Hannon. 

PA/AS  (2013),  110(18):7123-7130. 

CONCLUSION 

We  are  continuing  to  make  progress  towards  our  aims  in  the  proposal.  During  this  past 
year,  we  have  had  some  very  promising  outcomes  with  the  completion  of  the  human  shRNA 
library  and  with  most  of  the  mouse  shRNA  library  completed.  A  new  shRNA  algorithm  for 
predicting  sensor-verified  shRNAs  have  been  developed.  New  mammary  epithelial  stem  cell 
markers  have  been  identified  with  implications  toward  breast  cancer  development. 
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The  partial  purification  of  mouse  mammary  gland  stem  cells  (MaSCs) 
using  combinatorial  cell  surface  markers  (Lin“CD24'^CD29'^CD49f^) 
has  improved  our  understanding  of  their  role  in  normal  develop¬ 
ment  and  breast  tumorigenesis.  Despite  the  significant  improve¬ 
ment  in  MaSC  enrichment,  there  is  presently  no  methodology  that 
adequately  isolates  pure  MaSCs.  Seeking  new  markers  of  MaSCs, 
we  characterized  the  stem-like  properties  and  expression  signa¬ 
ture  of  label-retaining  cells  from  the  mammary  gland  of  mice 
expressing  a  controllable  H2b-GFP  transgene.  In  this  system,  the 
transgene  expression  can  be  repressed  in  a  doxycycline-dependent 
fashion,  allowing  isolation  of  slowly  dividing  cells  with  retained 
nuclear  GFP  signal.  Here,  we  show  that  H2b-GFP^  cells  reside 
within  the  predicted  MaSC  compartment  and  display  greater 
mammary  reconstitution  unit  frequency  compared  with  H2b- 
Gfp"®9  MaSCs.  According  to  their  transcriptome  profile,  H2b-GFP^ 
MaSCs  are  enriched  for  pathways  thought  to  play  important  roles 
in  adult  stem  cells.  We  found  Cd1d,  a  glycoprotein  expressed  on 
the  surface  of  antigen-presenting  cells,  to  be  highly  expressed  by 
H2b-GFP'^  MaSCs,  and  isolation  of  Cdld"^  MaSCs  further  improved 
the  mammary  reconstitution  unit  enrichment  frequency  to  nearly 
a  single-cell  level.  Additionally,  we  functionally  characterized  a  set 
of  MaSC-enriched  genes,  discovering  factors  controlling  MaSC  sur¬ 
vival.  Collectively,  our  data  provide  tools  for  isolating  a  more  pre¬ 
cisely  defined  population  of  MaSCs  and  point  to  potentially  critical 
factors  for  MaSC  maintenance. 

FACS  sorting  |  mammary  gland  transplant  |  shRNA  screen 

The  murine  mammary  gland  resembles,  to  some  extent,  the 
human  mammary  gland  in  development,  milk  production, 
and  progression  to  carcinogenesis,  making  it  an  ideal  system  to 
develop  methodologies  and  form  hypotheses  of  relevance  to 
women.  The  use  of  cell  surface  markers  to  isolate  selected  cell 
types  from  mice  has  greatly  enhanced  our  understanding  of  de¬ 
velopment  and  our  knowledge  of  molecular  pathways  and  inter¬ 
actions  that  influence  it.  Mammary  gland  stem  cells  (MaSCs)  have 
commanded  attention  because  of  not  only  their  roles  in  the  cycles 
of  gland  morphogenesis  but  also  their  potential  contribution  in 
tumor  initiation.  Full  characterization  of  MaSCs,  however,  has 
been  hampered  by  their  scarcity.  Enrichment  of  the  MaSC  com¬ 
partment  has,  until  now,  been  achieved  by  using  a  combination  of 
cell  surface  markers  (Lin“CD24'^CD29^CD49^)  (1,  2).  Thus  far, 
these  cells  have  been  enriched  to  1  MaSC  per  every  64  cells 
stained  Lin“CD24'^CD29^  (1).  This  is  sufficient  to  test  for  MaSC 
repopulation  capacity  and  to  some  extent,  roles  in  tumorigenesis, 
but  this  level  of  purity  is  less  suitable  for  more  complex  molecular 
analyses  that  define  MaSCs  and  their  properties. 

Additional  characterization  of  MaSCs  has  been  achieved  using 
a  transgenic  mouse  model  expressing  GFP  under  the  control  of  the 
s-ship  promoter  (3).  This  gene  is  expressed  in  embryonic  and  he¬ 
matopoietic  stem  cells  but  not  differentiated  cells  (4).  GFP'^  cells  in 
this  mouse  model  were  shown  to  reside  at  the  tips  of  the  terminal 
end  buds,  where  MaSCs  are  believed  to  be  located  in  these 


developing  mammary  gland  structures  (3,  5).  Transplantation  of 
the  MaSC-enriched  GFP'^CD49f^  cells  improved  the  mammary 
reconstitution  unit  (MRU)  frequency  to  1/48  cells,  an  increase  over 
the  previous  shown  frequency  for  CD24'^CD29^CD49f^  cells.  Al¬ 
though  being  very  elegantly  performed  and  enhancing  our  un¬ 
derstanding  of  MaSC  localization,  studies  with  this  mouse  model 
did  not  achieve  a  greater  enrichment  for  MaSCs  using  more  con¬ 
veniently  accessible  markers,  such  as  cell  surface  proteins. 

Given  the  limitations  in  accurately  purifying  MaSCs,  we  sought 
to  devise  a  method  better  suited  for  identifying  this  population. 
Here,  we  describe  the  use  of  long-term  label  retention  to  increase 
the  MRU  frequency  within  MaSC-enriched  CD24'^CD29^  cells. 
This  approach,  previously  applied  to  the  isolation  of  skin  stem 
cells  (6),  enables  the  identification  of  slowly  dividing  cells,  a 
characteristic  of  adult  stem  cells.  To  mark  slowly  dividing  cells, 
expression  of  the  H2b  histone,  linked  to  GFP,  is  regulated  by  a 
tetracycline  responsive  element  (TRE)  and  a  tet-controlled 
transcription  activator  (tTA)  under  the  endogenous  keratin  K5 
promoter  (K5tTA-H2b-GFP).  In  the  absence  of  tetracycline  or  its 
analog  doxycycline  (DOX),  the  tTA  binds  to  TRE  and  activates 
transcription  of  H2b-GFP.  Treatment  with  DOX  prevents  the 
tTA  binding  to  TRE,  and  transcription  of  H2b-GFP  is  terminated 
(6).  As  the  cell  divides,  newly  synthesized,  unlabeled  H2b  replaces 
the  H2b-GFP;  therefore,  the  more  slowly  dividing  cells  will  retain 
GFP  expression  for  an  extended  period. 

We  were  able  to  improve  the  MaSC  enrichment  by  isolating 
GFP-retaining  cells  after  a  long-term  inhibition  of  transgene 
expression.  We  refer  to  these  cells  as  H2b-GFP^  MaSCs  (CD24'^ 
CD29^H2b-GFP^).  Comparisons  between  expression  profiles  of 
all  mammary  gland  cell  types  suggested  that  H2b-GFP^  MaSCs 
differentially  expressed  several  genes  involved  in  pathways  pre¬ 
viously  described  as  playing  roles  in  other  adult  stem  cells.  Ad¬ 
ditional  analysis  of  the  H2b-GFP^  MaSC  expression  signature 
led  to  the  identification  of  a  cell  surface  marker  that,  combined 
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with  conventional  markers,  resulted  in  the  isolation  of  an  MaSC 
population  with  an  elevated  proportion  of  MRUs.  In  addition, 
we  performed  a  focused  shRNA  screen,  targeting  genes  that 
were  differentially  expressed  in  our  newly  characterized  MaSC- 
enriched  cell  population,  revealing  potential  regulators  of  mam¬ 
mary  gland  biogenesis.  Overall,  this  work  improves  our  ability  to 
purify  MaSCs  and  provides  valuable  insights  into  their  role  in 
mammary  gland  development  and  perhaps,  even  tumor  initiation. 

Results 

H2b-GFP  Label-Retaining  Cells  Enrich  for  MaSCs.  To  better  enrich  for 
the  MaSC  population,  we  assessed  the  feasibility  of  using  mam¬ 
mary  gland  label-retaining  cells  to  select  for  MaSCs,  given  that 
a  slower  division  rate  is  an  excepted  characteristic  of  adult  stem 
cells.  We  adopted  a  system  wherein  expression  of  the  H2b  his¬ 
tone,  linked  to  GFP,  is  regulated  by  a  TRE  and  a  tTA  under  the 
endogenous  keratin  K5  promoter  K5tTA-H2b-GFP  (a  gift  from 
Elaine  Fuchs,  Rockefeller  University,  New  York,  NY).  Keratin 
K5  is  expressed  in  cells  of  the  basal  compartment,  the  region 
considered  to  be  home  to  MaSCs  (7).  This  system  displays  some 
advantages  over  the  previous  gene  reporter-based  methods  used 
to  isolate  MaSCs,  because  it  takes  advantage  of  one  of  the  more 
general  properties  of  stem  cells:  their  relative  quiescence.  In 
support  of  the  use  of  this  mouse  model,  there  were  previous  hints 
that  MaSC-enriched  CD24'^CD29^  cells  display  BrdU  label- 
retaining  properties  (1),  although  label-retaining  populations 
were  not  functionally  characterized. 

Initial  experiments  using  the  H2b-GFP  mice  assessed  the  ex¬ 
pression  and  distribution  of  GFP-positive  cells  in  the  adult 
mammary  gland  (Fig.  L4).  Histological  sections  revealed  the 
presence  of  several  GFE”^  cells  located  within  structures  re¬ 
sembling  the  mammary  gland  ductal  epithelium  (Fig.  \B  and  Fig. 
SL4,  Upper).  Treatment  of  H2b-GFP  mice  with  DOX  over  a  12-wk 
period,  thus  ceasing  transcription  of  H2b-GFP  transgene,  dra¬ 
matically  reduced  the  number  of  cells  expressing  GFP.  Notably, 
those  cells  that  remained  GFP”^  were  located  at  the  tips  of  the 
terminal  end  buds.  These  distinct  sites  in  the  ductal  epithelium 
are  the  areas  currently  believed  to  be  resident  by  MaSCs  (8)  (Fig. 
1C  and  Fig.  SL4,  Lower). 

To  compliment  this  observation,  under  the  hypothesis  that 
mammary  gland  label-retaining  cells  comprise  a  population  of 
potential  MaSCs,  we  investigated  the  correlation  between  GFP 
retention  and  expression  of  previously  defined  MaSC-enriched 
cell  surface  markers,  CD24  and  CD29.  Using  FACS  analysis,  we 
were  able  to  subdivide  the  mammary  gland  (after  depletion 
of  endothelial  and  hematopoietic  cells  as  shown  in  Fig.  SIB)  into 
three  distinct  cell  compartments:  luminal  (CD24^CD29‘^),  oc¬ 
cupied  by  luminal  cells;  basal  (CD24'^CD29^),  occupied  by 
myoepithelial  cells  and  MaSCs;  and  stromal  (CD24“CD29'^)  (1) 
(Fig.  ID,  Upper  Left).  The  majority  of  CFF”^  cells  from  a  trans¬ 
genic  H2b-GFP  mouse  off  DOX  could  be  categorized  into  either 
basal  or  stromal  compartments,  with  far  fewer  CFF”^  cells  oc¬ 
cupying  the  luminal  compartment  (Fig.  ID,  Upper  Right  and  Fig. 
SIC,  Left).  After  a  12-wk  DOX  chase,  the  overall  proportion 
of  GF?”^  cells  decreased  by  more  than  one-half,  and  the 
presence  of  a  GFP”^  luminal  compartment  was  all  but  elimi¬ 
nated  (Fig.  ID,  Lower  Left  and  Fig.  SIC,  Center).  Focusing  on 
GFP  intensity  (a  measure  that  directly  relates  to  the  rate  of  cell 
division),  selection  of  only  the  brightest  GFP"^  cells  (GFP^) 
resulted  in  a  greater  proportion  remaining  in  the  CD24^CD29^ 
basal  compartment,  whereas  the  stromal  compartment  was  sig¬ 
nificantly  reduced  after  GFP^^”^  cells  were  removed  (Fig.  ID, 
Lower  Right  and  Fig.  SIC,  Right).  This  result  suggests  that  the 
most  label-retaining  cells  reside  within  the  basal  compart¬ 
ment  and  may  represent  the  MaSCs  population. 

The  benefit  of  using  GFP  to  test  for  label  retention,  as  op¬ 
posed  to  BrdU,  is  that  its  detection  does  not  require  fixation  and 
staining.  We  were  then  able  to  test  the  biological  differences. 


using  mammary  gland  transplants,  between  GFP^  cells  (H2b- 
GF^  MaSCs)  and  GFP”  cells  (H2b-GFP-  MaSCs)  within  the 
MaSC-enriched  compartment.  Transplantation  assays  are  a  fun¬ 
damental  criterion  to  evaluate  sternness  and  have  been  used 
previously  for  several  tissues,  including  the  mammary  gland  (1,  2,  9). 
For  these  experiments,  the  inguinal  glands  were  removed  from 
the  endogenous  tissue  of  prepubescent  females  before  injection 
of  donor  cells.  Donor  cells  were  harvested  from  mammary  glands 
of  H2b-GFP  mice  after  a  12-wk  DOX  chase,  dissociated,  lineage- 
depleted,  and  sorted  according  to  GFP  intensity  (Fig.  SID).  Cells 
(GFP^  and  GFP“)  were  then  injected,  and  outgrowths  from 
donor  cells  were  compared  (by  visualization  of  GFP”^  epithe¬ 
lium)  12  wk  posttransplantation.  Given  that  the  recipient  animals 
are  not  treated  with  DOX,  all  cells  derived  from  the  donor  mice 
will  resume  expression  of  the  H2b-GFP  transgene  and  give  rise 
to  GFP”^  outgrows.  MRU  frequency  was  estimated  according 
to  the  previously  described  algorithm  (10).  Transplantation  of 
500  H2b-GFP^  MaSCs  {n  =  5)  gave  rise  to  GFP'^  epithelium  in 
all  injected  glands.  This  ability  to  reconstitute  was  still  retained 
when  only  50  cells  were  transplanted  (Fig.  IE).  In  contrast,  only 
one-half  of  the  glands  injected  with  500  H2b-GFP“  MaSCs  dis¬ 
played  fluorescent  outgrowths,  decreasing  to  just  29%  with  in¬ 
jection  of  50  cells  (Dataset  SI).  These  results  represent  an 
increase  in  the  estimated  frequency  of  MRUs  from  1/70  cells,  when 
MaSC  selection  was  performed  using  CD24'^CD29^  alone  (1),  to 
1/33  cells,  with  restriction  to  H2b-GFP^  cells  to  further  define 
MaSCs.  Comparatively,  the  MRU  frequency  among  H2b-GFP“ 
MaSCs  was  estimated  to  be  1/149  (Dataset  SI).  Colony-forming 
ability  was  also  twofold  greater  for  H2b-GFP^  MaSCs  when  500  of 
these  cells  were  seeded  in  a  Matrigel  (BD  Bioscience)  and  cultured 
for  7  d  (Fig.  SID). 

Considered  together,  these  data  suggest  that  mammary  gland 
H2b-GFP^  label-retaining  cells  represent  a  subset,  if  not  an  entire 
population,  of  the  MaSCs.  Our  experiments  using  a  repressible 
H2b-GFP  transgene  have  built  on  previous  knowledge  regarding 
the  label-retaining  properties  of  stem  cells  in  the  mammary  gland 
and  confirmed  that  MaSC  CD24'^CD29^  cells  reside  mainly 
within  the  H2b-GFP^  label-retaining  cell  population.  In  addition 
to  these  experiments,  we  also  found  that  hormone-dependent 
activation  of  MaSC  proliferation  and  differentiation,  triggered 
by  one  complete  cycle  of  pregnancy  and  involution  in  transgenic 
H2b-GFP  mice  treated  with  DOX,  completely  depleted  CFF”^ 
cells,  validating  that  H2b-GFP^  cells  truly  represent  a  population 
of  slowly  dividing  cells  rather  than  being  a  transgenic  artifact. 

It  has  been  proposed  that  MaSCs  comprise  less  than  5%  of  the 
total  basal  compartment.  Our  findings  support  this  notion  given 
that  we  find  label-retaining  H2b-GF^  cells  to  account  for  -^0.2% 
of  the  total  0024^^0029^  population  (Fig.  SID,  Upper  Right).  We 
also  compared  the  distribution  of  H2b-GFP^-retaining  cells  with 
expression  of  a  recently  identified  marker  for  myoepithelial 
progenitor-like  cells,  CD61.  This  marker  was  expressed  by  most  of 
the  H2b-GFP^^”^  population,  whereas  virtually  all  H2b-GFP^  cells 
were  negative  for  CD61  staining,  suggesting  perhaps  a  unique 
mammary  gland  cell  differentiation  pattern,  where  H2b-GFP^  la¬ 
bel-retaining  cells  might  occupy  the  top  of  hierarchy. 

H2b-GFP  Cells  Display  a  Stem  Cell-Like  Expression  Signature.  Having 
established  that  H2b-GFP^  MaSCs  have  reconstitution  properties, 
we  next  sought  to  determine  where  these  cells  fall  in  the  mammary 
differentiation  hierarchy  with  regard  to  their  gene  expression  pat¬ 
terns.  Using  a  combination  of  cell  surface  markers  (1,  11),  six  dis¬ 
tinct  cell  types  were  isolated  by  FACS  to  a  purity  of  >90%:  H2b- 
GFP  MaSCs  (Lin-CD24+CD29'’H2b-GFP‘'CD6r),  myoepithelial 
progenitor-like  cells  (Lin-CD24+CD29‘'H2b-GFP“^‘CD61+), 
myoepithelial  differentiated  cells  (Lin“CD24’^CD29''H2b-GFP“ 
CD61“),  luminal  progenitor  cells  (Lin“CD24'’CD29’^CD6V 
CD133”),  luminal  ductal  cells  (Lin“CD24*'CD29+CD61“CD133+), 
and  luminal  alveolar  cells  (Lin-CD24‘'CD29+CD6r  00133“) 
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Fig.  1.  H2b-GFP  label-retaining  cells  represent  a  population  of  MaSCs.  {A)  Experimental  scheme.  Mammary  glands  were  harvested  from  K5tTa-H2b-GFP 
transgenic  mice  either  off  (GFP  pulse)  or  on  DOX  diet  (GFP  chase)  and  further  processed  for  immunological  staining  or  single-cell  suspension  FACS  sorting.  {B  and 
C)  Tissue  histology  H2b-GFP^  cells  distribution.  Mammary  glands  from  transgenic  mice  off  and  on  DOX  diet  were  harvested,  defatted,  embedded  in  agarose,  and 
imaged  with  two-photon  microscopy.  {B)  Mice  off  DOX  diet  (GFP  pulse)  showing  a  broad  distribution  of  GFP+  cells  in  mammary  gland  ductal  structures.  (C)  After 
a  12-wk  DOX  chase,  H2b-GFP^  label-retaining  cells  became  restricted  to  the  edges  of  the  ductal  structures.  (D)  Flow  cytometry  profile  of  H2B-GFP^  cells.  Upper  Left 
shows  the  profile  of  a  lineage-depleted  (CD45“,  Ter119“,  and  CD31“)  nontransgenic  mammary  gland  according  to  CD24  and  CD29  staining  and  highlights  the 
three  cell  compartments:  luminal  (CD24'^CD29'^;  comprising  luminal  progenitor  cells,  luminal  alveolar  cells,  and  luminal  ductal  cells);  basal  (CD24'^CD29^  com¬ 
prising  myoepithelial  progenitor  cells,  myoepithelial  differentiated  cells,  and  MaSCs);  and  stromal.  Total  GFP'^  cells  from  H2b-GFP  transgenic  mice  off  DOX  diet 
(GFP  pulse  mice;  Upper  Right)  displayed  a  similar  cellular  compartmental  distribution  with  fewer  luminal-type  cells.  The  CD24CD29  cell  profile  of  H2b-GFP^  cells 
from  GFP  chase  mice  (on  DOX)  were  analyzed  using  two  strategies  to  define  GFP-expressing  cells.  Lower  Left  displays  CD24CD29  staining  of  total  H2b-GFP'^  cells, 
whereas  Lower  Right  shows  the  CD24CD29  staining  of  H2b-GFP'^  cells.  The  focus  on  GFP*^  cells,  the  most  label-retaining  cells,  drastically  decreased  the  cellular 
content  of  all  mammary  gland  compartments  and  retained  a  greater  proportion  of  cells  inside  of  the  basal  compartment,  potentially  representing  MaSCs.  (E) 
Histological  analysis  of  mammary  gland  H2b-GFP'^  MaSCs  outgrowths.  Cleared  fat  pads  from  prepubescent  female  mice  were  injected  with  either  total  H2b-GFP“ 
MaSCs  (CD24^CD29'^GFP“  cells)  or  H2b-GFP'^  MaSCs  (CD24+CD29'^GFP'^cells),  harvested  12  wk  after  transplantation,  embedded  in  agarose,  stained  with  antibodies, 
and  imaged  on  a  Zeiss  710  LSM  (Zeiss)  confocal  microscope.  Images  display  outgrowths  of  two  distinct  glands  injected  with  HZb-GFP*^  MaSCs. 


(Fig.  2A).  The  myoepithelial  progenitor-like  cells  were  defined 
by  expression  of  CD61  as  a  positive  cell  surface  marker  and  their 
positioning  as  the  second  most  label-retaining  cell  population. 

Hierarchical  clustering  of  combined  RNAseq  replicates  split 
mammary  gland  cells  into  two  main  branches:  the  basal  com¬ 
partment,  comprising  myoepithelial  progenitor  cells,  myoepithe¬ 
lial  differentiated  cells,  and  H2b-GFP  MaSCs,  and  the  luminal 
compartment,  with  luminal  progenitor  cells  and  differentiated 
cells  (Fig.  2B).  As  predicted  by  prior  characterization  of  MaSCs 
(1),  we  found  the  expression  profile  of  H2b-GFP  MaSCs  to  be 


more  closely  related  to  the  expression  profile  of  myoepithelial  cells 
than  luminal  cells;  however,  H2b-GFP  MaSCs  were  still  an  out¬ 
group  compared  with  other  cells  in  this  cluster.  Analysis  over  all 
mammary  gland  cell  types  yielded  several  hundred  genes  differ¬ 
entially  expressed  among  all  cell  types  (Fig.  2B),  spanning  diverse 
gene  ontology  groups  and  pathways  (Dataset  S2).  More  specifi¬ 
cally,  genes  differentially  expressed  in  H2b-GFP  MaSCs  were 
enriched  in  G  protein-coupled  receptors  and  pathways  involving 
Wnt/B-catenin  signaling,  areas  previously  described  to  play  fun¬ 
damental  roles  in  other  adult  stem  cells  (12).  Differential 
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Fig.  2.  H2b-GFP'^  MaSCs  display  a  stem-like  expression  signature.  {A)  Sorting 
strategy.  We  used  a  combination  of  four  cell  surface  markers  in  addition  to 
H2b-GFP  expression  to  segregate  the  lineage-depleted  mammary  gland  cells 
into  six  distinct  cell  types:  H2b-GFP^  MaSCs  (Lin-CD24+CD29^H2b-GFP^CD6r), 
myoepithelial  progenitors  cells  (Lin-CD24^CD29'^H2b-GFP“CD61^),  myoepi¬ 
thelial  differentiated  cells  (Lin“CD24'^CD29'^H2b-GFP“CD61“),  luminal  pro¬ 
genitor  cells  (Lin-CD24^CD29'CD61+CD133-),  luminal  ductal  cells  (Lin-CD24^ 
CD29'CD6rCD133+),  and  luminal  alveolar  cells  (Lin-CD24^CD29'CD6r 
CD133“).  For  each  library,  two  biological  replicates  were  analyzed.  {B) 
Mammary  gland  differential  expression  heat  map.  Clustering  of  RPKM 
profiles  for  the  100  genes  with  highest  variance  across  all  samples.  Two  main 
cell  clusters  were  generated  according  to  the  expression  patterns  of  ana¬ 
lyzed  genes:  luminal-  (progenitor,  alveolar,  and  ductal  cells)  and  basal-type 
cells  (H2b-GFP'^  MaSCs,  progenitors,  and  differentiated  cells).  Note  that  H2b- 
GFP*^  MaSCs  cluster  with  other  basal  compartment  cells  but  have  an  ex¬ 
pression  signature  distinct  from  the  other  two  cell  types  in  this  cluster. 


expression  patterns  were  confirmed  on  four  genes  by  performing 
quantitative  RT-PCR  on  H2b-GFP  MaSCs  (n  =  4  individually 
sorted  samples)  and  myoepithelial  progenitor  cells  (n  =  3  in¬ 
dividually  sorted  samples)  (Fig.  S2).  mRNA  for  the  Cd24  and 
Cd29  genes  was  quantified  as  control,  because  all  myoepithelial 
cells  displayed  similar  levels  of  expression  for  these  genes. 

These  results  further  confirmed  that  mammary  cell  types  could 
be  differentiated  based  on  their  gene  expression  profiles,  allowing 
us  to  use  these  profiles  to  select  cell  type-specific  genetic  identifiers. 

Additional  Cell  Surface  Marker  to  Improve  MaSCs  Purification.  Be¬ 
cause  of  limitations  on  the  ability  to  purify  MaSCs  to  homoge¬ 
neity  based  on  currently  used  cell  surface  markers,  we  searched 
for  new  surface  markers  that  might  identify  MaSCs  using  the 
RNAseq  data.  We  first  generated  a  list  of  ^-500  genes  that  en¬ 
code  for  cell  surface  markers  according  to  their  gene  ontology 
term  function  (e.g.,  basolateral  membrane,  cell  surface,  mem¬ 
brane  protein,  or  basement  membrane).  This  list  was  further 
reduced  to  genes  with  high  expression  levels  for  the  MaSC  H2b- 
GFP  cells.  Five  candidate  cell  surface  proteins  came  out  of  this 
analysis  (Fig.  S3v4):  CDld,  a  glycoprotein  expressed  on  the  sur¬ 
face  of  various  mouse  and  human  antigen-presenting  cells  (13); 
Cd59a,  a  regulator  of  the  membrane  attack  complex  (14);  CD22, 
a  regulatory  lectin  involved  in  repressing  hyperactivation  of  the 
immune  system  (15);  CD93,  a  C-type  lectin  involved  in  cell-cell 
adhesion  processes  (16);  and  CD74,  an  HLA  class  II  protein, 
part  of  the  major  histocompatibility  complex  (17).  Antibodies 
against  CDld,  CDS 9a,  and  CD22  positively  stained  a  distinct 
population  of  cells  contained  within  the  Lin-MaSC  CD24'^ 
CD29^  cells  (Fig.  3v4),  whereas  antibodies  against  the  proteins. 


CD93  and  CD74,  failed  to  stain  any  mammary  gland  cells.  We 
further  tested  CDld  MaSCs  (CD24+CD29^CDld+),  CD59a 
MaSCs  (CD24+CD29^CD59a^),  and  CD22  MaSCs  (CD24+ 
CD29^CD22'^)  for  their  ability  to  grow  colonies  in  Matrigel  cul¬ 
ture.  Two  populations,  CDld  MaSCs  and  CD59a^  MaSCs  [rep¬ 
resenting  1%  and  4%,  respectively,  of  the  total  MaSC  (CD24'^ 
CD29^)  population],  displayed  an  approximately  twofold  increase 
in  colony-forming  ability  compared  with  the  total  MaSCs  pop¬ 
ulation  (Fig.  S3B).  However,  we  found  CDld  MaSCs  to  have  a 
greater  colony-forming  ability  compared  with  CD59a^  MaSCs, 
with  one-half  as  many  cells  needed  to  produce  the  same  number 
of  colonies  (200  and  500  cells,  respectively,  seeded  on  Matrigel). 
Additional  analysis  showed  that  all  CDld”^  cells  from  the  MaSC- 
enriched  CD24'^CD29^  population  were  also  CD59a^,  whereas 
the  remaining  majority  of  CD59a‘^  cells  from  the  MaSC-enriched 
CD24'^CD29^  population  was  negative  for  CDld  expression  (Fig. 
3B).  Based  on  the  enhanced  colony-forming  abilities  of  CDld 
MaSCs  over  CD59a^  MaSCs  and  the  overlap  of  the  two  markers 
within  the  CDld”^  populations,  we  decided  to  pursue  the  experi¬ 
ments  using  CDld  as  an  MaSC  marker. 

We  next  sorted  CDld”^  MaSCs  for  RNAseq  and  compared 
their  gene  expression  profile  with  those  cell  populations  de¬ 
scribed  in  Fig.  2.  Cluster  analysis  of  all  RNAseq  libraries  suggests 
that  the  CDld  MaSC  expression  signature  is  closer  to  the  ex¬ 
pression  pattern  found  for  H2b-GFP  MaSCs  than  for  any  other 
cell  type  (Fig.  S3C).  These  results  could  be  suggestive  that  the 
common  expression  signature  between  CDld”^  MaSCs  and  H2b- 
GFP^  MaSCs  defines  the  stem  cell  state  of  mammary  gland  cells. 

To  ask  whether  CDld"^  MaSCs  are  slowly  dividing  cells,  we 
performed  BrdU  label  retention  experiments.  We  injected  BrdU 
into  eight  prepubescence  female  mice  (3  wk  old)  over  5  consec¬ 
utive  d.  CeUs  were  harvested  on  the  day  of  the  last  injection  (week  0) 
from  one-half  of  the  mice  and  after  12  wk  from  the  remaining  mice 
(Fig.  3C).  FACS  analysis  showed  that  ^-20%  of  the  total  MaSC 
population  retained  BrdU,  and  up  to  60%  of  CDld  MaSCs  were 
BrdU-retentive.  This  result  adds  confidence  to  the  use  of  CDld  as 
a  cell  surface  marker  to  represent  the  H2b-GFP  MaSCs,  because  it 
is  the  most  label-retaining  cells  and  perhaps,  therefore,  the  most 
enriched  for  stem-like  cells  within  the  mammary  gland. 

We  then  went  on  to  repeat  the  mammary  gland  reconstruction 
assays,  but  this  time,  we  compared  CDld”^  MaSC  transplantation 
efficiency  with  the  transplantation  efficiency  displayed  by  the 
total  MaSC  (Lin“CD24'^CD29^)  population  using  cells  from  the 
H2b-GFP  mice  off  DOX.  Comparing  donor-derived  outgrowths 
(identifiable  by  GFP  expression)  between  injection  with  CDld”^ 
MaSCs  and  injection  with  total  MaSCs,  we  found  that,  despite 
bringing  the  injected  cell  number  down  to  single  digits,  CDld”^ 
MaSCs  effectively  gave  rise  to  GFP  outgrowths  in  the  majority  of 
graft  recipients  (Fig.  3D  and  Dataset  S3).  This  result  gave  a  pre¬ 
dicted  MRU  frequency  of  '-l/S  CDld  MaSCs  compared  with  the 
1/44  MRU  frequency  from  total  MaSCs  (Dataset  S3).  FACS 
collection  of  CDld”^  MaSCs  from  a  reconstructed  gland  also  ef¬ 
fectively  gave  rise  to  a  gland  when  serially  transplanted  into  an¬ 
other  mouse,  showing  that  these  cells  also  have  the  capacity  to 
self-renew  in  addition  to  regenerating  the  gland  (Fig.  3E). 

MaSC-Focused  shRNA  Screen.  To  identify  genes  and  pathways 
necessary  for  the  maintenance  of  MaSC  reconstitution  potential, 
we  selected  a  set  of  abundantly  and  differentially  expressed  genes 
from  RNAseq  libraries  of  H2b-GFP^  MaSCs  and  CDld”^  MaSCs 
and  targeted  them  in  shRNA-mediated  knockdown  experiments. 
We  used  shRNAs  identified  by  a  prediction  algorithm  developed 
in  our  laboratory,  taking,  on  average,  two  hairpins  per  gene. 
Hairpins  targeting  nondifferentially  expressed  genes  were  also 
included  as  well  as  depletion  control  hairpins  targeting  Rpa3  and 
Polr2b  and  neutral  control  hairpins  targeting  Firefly  luciferase 
and  Renilla  luciferase.  All  genes  were  targeted  in  a  one-by-one 
approach  in  an  assay  lasting  ^-3  wk  (Fig.  S44). 
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Fig.  3.  Cdid  is  an  additional  cell  surface  marker  for  purification  of  MaSCs.  {A)  FACS  analysis  of  MaSC  cell  surface  markers.  Total  MaSCs  (CD24+CD29'^  cells) 
were  additionally  segregated  according  to  the  expression  of  the  cell  surface  markers  CDId,  CD59a^  or  CD22.  {B)  CDId  is  expressed  by  a  subset  of  CD59a'^  cells. 
Lin“  mammary  gland  cells  were  stained  with  antibodies  against  CD24,  CD29,  CDId,  and  CD59a  and  further  analyzed  on  an  LSRII  Cell  Analyzer  (BD  Bioscience). 
The  entire  basal  compartment  (CD24^CD29'^)  was  selected  and  analyzed  according  to  CDId  and  CD59a  expression.  The  majority  of  cells  within  the  basal 
compartment  stained  positive  for  CD59a,  and  CDId^  cells  fell  mainly  in  the  CDSBa*^  area.  (C)  CDId  MaSCs  are  the  most  label-retaining  cells  within  the 
mammary  gland.  Prepubescence  mice  were  injected  with  BrdU  (50  mg/kg  body  weight)  for  5  d.  Glands  were  either  harvested  from  mice  on  the  last  day  of 
BrdU  injection  to  evaluate  the  total  BrdU  incorporation  (week  0)  or  harvested  after  a  12  wk  BrdU  chase.  Single-cell  suspensions  were  stained  with  antibodies 
against  CD24,  CD29,  and  CDId  and  analyzed  on  an  LSRII  Cell  Analyzer  (BD  Bioscience).  BrdU  incorporation  was  measured  in  total  MaSC  (CD24^CD29'^)  and 
CDId  MaSC  (CD24+CD29'^CD1d^)  populations.  (D  and  E)  Histological  analysis  of  mammary  gland  CDId  MaSCs  outgrowths.  Cleared  fat  pads  from  pre- 
pubescent  female  mice  were  injected  with  (D)  either  total  MaSCs  or  CDId  MaSCs  and  (E)  25  CDId  MaSCs  cells  harvested  from  glands  pretransplanted  with 
CDId  MaSCs.  Glands  were  harvested  12  wk  after  transplantation  and  embedded  in  agarose,  and  endogenous  GFP  signal  was  imaged.  Images  display  out¬ 
growths  from  two  distinct  glands  injected  with  CDId  MaSCs  cells  (D)  or  secondary  transplanted  CDId  MaSCs  (E). 


shRNAs  were  introduced  into  the  immortalized  mammary 
gland  cell  line,  Comma-Dp  (18).  These  cells  give  rise  to  both 
luminal  and  myoepithelial  compartments  in  colony-forming  and 
transplantation  assays,  independent  of  the  method  of  MaSC 
enrichment  (19-21).  In  addition,  ^-50%  of  Comma-Dp  cells  stain 
positive  for  Cdid,  placing  them  in  our  improved  MaSCs  isolation 
profile  (Fig.  S4B). 

dos  Santos  et  al. 


Cells  were  monitored  for  GFP  expression  (as  a  proxy  for 
shRNA  expression),  and  changes  in  the  proportion  of  GFP- 
expressing  cells  would  be  indicative  of  a  relevant  gene  function. 
The  majority  of  screened  shRNAs  did  not  alter  GFP  levels  during 
the  3-wk  screening  period  (Fig.  44),  which  could  suggest  that  the 
correspondent  genes  were  not  essential  for  growth  maintenance 
of  Comma-Dp  cells.  However,  a  distinct  set  of  shRNAs  altered 
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Fig.  4.  Mammary  gland  focused  screen.  (/A)  One  by  one  screen;  206  shRNAs, 
covering  ~56  genes,  were  tested.  The  solid  line  square  shows  the  fold 
change  of  shRNAs  considered  to  be  lethal,  because  GFP  percent  for  these 
cells  decreased  overtime,  whereas  the  dashed  line  square  highlights  data 
from  shRNAs  considered  to  show  survival  preferences  in  cells,  because  GFP 
percent  increased  overtime.  {B)  Screen  hits  validation.  Two  new  hairpins 
targeting  four  genes  selected  as  lethal  hits  from  the  first  screen.  The  chart 
represents  results  of  two  independent  experiments.  *P  =  0.05  by  the  t  test. 


the  maintenance  of  GFP-expressing  cells  by  either  depleting 
GFP”^  cells  (Fig.  44,  lethal  shRNAs)  or  promoting  expansion  of 
GFP”^  cells  (Fig.  44,  survival  shRNAs)  over  time. 

We  decided  to  further  investigate  a  subset  of  genes  that  in¬ 
terfered  with  Comma-Dp  growth,  because  our  focus  was  to  un¬ 
derstand  the  spectrum  of  genes  that  might  block  normal 
mammary  gland  biogenesis.  Among  the  selected  genes  were 
mucin-like  gene  {Muc4),  G  protein-coupled  receptor  gene  family 
member  {Grk4),  and  transcription  factors  {MafK  and  Sltm).  An 
additional  set  of  hairpins  for  these  genes  was  rescreened  in 
Comma-Dp  cells  with  GFP  levels  followed  for  10  d.  No  clear 
effect  on  the  percentage  of  GFP-positive  cells  was  observed 
when  cells  expressed  the  new  shRNAs  against  Muc4  and  Renilla 
luciferase  control,  whereas  an  shRNA-dependent  response  was 
observed,  according  to  GFP  frequency,  when  the  genes  Grk4  and 
Mafk  were  targeted  (Fig.  AB).  In  addition,  both  new  shRNAs 
against  the  gene  Sltm  consistently  decreased  GFP-expressing 
cells  to  levels  comparable  with  the  depletion  achieved  by  Rpa3, 
the  lethal  control.  Interestingly,  Sltm  encodes  a  transcription 
factor-like  protein  that  binds  both  DNA  (scaffold  attachment 
factor-box  DNA  binding  motif)  and  RNA  (RNA  binding  do¬ 
main)  in  response  to  estrogen  levels  (22).  We  are  currently 
investigating  the  implications  caused  by  loss  of  Sltm  expression 
during  normal  mammary  gland  development  and  tumorigenesis. 

Discussion 

The  ongoing  interest  in  stem  cells  and  more  recently,  cancer 
stem  cells  highlights  the  need  for  improvements  in  purification 
and  analysis  of  this  rare  but  important  population.  Our  previous 
understanding  of  MaSCs  has  been  clouded  by  the  limited 


capability  to  obtain  a  pure  population  devoid  of  contaminating, 
more  differentiated  cells.  Here,  we  took  advantage  of  a  pre¬ 
viously  used  system  to  identify  relatively  quiescent  cells  (6)  in  the 
mammary  gland.  We  propose  that  the  label-retaining  cells  from  the 
K5tTa-H2b-GFP  mouse  represent  a  subset  of  active  MaSCs, 
displaying  increased  mammary  gland  reconstitution  ability  over 
previously  published  cell  populations  identified  as  MaSCs. 

Unlike  previous  methods,  where  cell  selection  is  based  on  the 
presence  of  constitutive  fluorescence  in  cells  (3,  23),  the  use  of 
a  cell  state-dependent  GFP  system  allows  for  a  more  biological 
relevant  fluorescence  reliability.  The  extended  time  between 
halting  GFP  expression  and  analysis  and  also,  selection  of  only 
the  brightest  cells  decrease  the  possibility  that  the  GFP  protein 
might  be  detected  in  a  cell  cycling  at  a  normal  rate,  despite  the 
fact  that  the  GFP  expression  is  switched  off.  This  system  allowed 
for  the  possibility  of  a  much  more  stringent  selection  process; 
however,  it  is  aclmowledged  that  there  are  limitations  with  using 
this  mouse  model  on  a  routine  basis  for  enriching  for  MaSCs. 
The  need  to  use  a  transgenic  mouse  and  however  reduced,  the 
level  of  heterogeneity  within  cells  selected — evident  by  <100% 
regrowth  efficiency — illustrate  the  need  for  the  cell  surface 
marker,  CD  Id,  identified  in  our  study. 

Cdid  is  known  to  be  expressed  as  a  cell  surface  marker  on 
a  variety  of  antigen-presenting  cells  belonging  to  a  cluster  of 
glycoproteins  involved  in  T-cell  antigen  presentation  (13).  Be¬ 
cause  we  physically  remove  all  hematopoietic  cells  using  mag¬ 
netic  beads  before  FACS,  we  are  confident  that  this  differentially 
expressed  marker  does  not  simply  reflect  contaminating  cells. 
This  statement  is  supported  by  the  presence  of  CDId”^  cells 
within  the  normal-like  mammary  gland  cell  line,  Comma-IDp.  In 
fact,  50%  of  these  cells,  isolated  during  midpregnancy,  were 
positive  for  CD  Id  when  stained  with  two  distinct  antibodies 
(Fig.  SAB). 

We,  therefore,  propose  that  CDId  is  a  genuine  marker  for 
MaSCs  and  when  used  combined  with  the  cell  surface  markers 
CD24  and  CD29,  greatly  enhances  the  purification  of  recon¬ 
stituting  cells  above  and  beyond  those  cells  selected  based  on 
label  retention  alone  and  those  selected  based  on  previously 
published  markers.  We  perhaps  did  not  exhaust  all  of  the  pos¬ 
sibilities  presented  by  our  RNAseq  data  for  the  description  of 
novel  MaSC  markers,  but  our  findings  do  support  CD  Id  as  being 
a  valuable  component  for  purifying  true  MaSCs. 

We  found  the  proportion  of  CDId”^  cells  (1%)  within  the  basal 
compartment  to  be  greater  than  the  proportion  of  H2b-GFP^  cells 
(0.2%)  in  this  same  compartment.  This  observation  draws  to  light 
another  drawback  of  relying  solely  on  this  particular  label- 
retaining  mouse  model  and  in  the  same  context,  relying  on  GFP 
expression  of  a  gene  reporter  mouse  to  identify  MaSCs.  The 
cytokeratin  K5  (Krt5),  for  example,  although  shown  to  be 
expressed  by  basal-type  cells,  may  not  be  expressed  by  all  cells  in 
this  compartment.  In  addition,  GFP  expression  may  also  be  dis¬ 
rupted  in  some  cells,  perhaps  by  suppression  of  the  transgene 
promoter;  alternatively,  some  cells  could  fail  to  shut  down  GFP 
expression  on  DOX  treatment.  Had  we  only  selected  cells  based 
on  GFP  expression  from  the  K5  promoter,  we  would  not  be 
selecting  all — or  solely — those  cells  capable  of  self-renewal.  This 
exclusion  of  MaSCs  has  been  illustrated  previously,  where  cells 
negative  for  a  reporter  GFP  were  able  to  still  proliferate  and  re¬ 
generate  into  a  new  gland  (3),  something  that  we  also  see  to 
a  small  degree  with  the  K5tTa-H2b-GFP  mouse  model. 

The  identification  of  CD  Id  as  a  unique  marker  for  this  MaSC 
population  and  the  distinct  transcrip  tome  of  the  Cdid  MaSCs 
suggest  that  these  cells  perform  a  function  distinct  from  pro¬ 
genitors  and  more  differentiated  cells.  Despite  their  gene  ex¬ 
pression  profile  clustering  more  closely  with  myoepithelial  cells, 
they  are  still  able  to  produce  a  new  gland.  It  is  unclear,  however, 
if  all  of  the  CD  Id  MaSCs  are  multipotent  stem  cells  or  if  they 
represent  a  combination  of  the  recently  described  luminal  and 
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myoepithelial  unipotent  MaSCs  (23).  Because  we  are  not  using 
lineage  tracing  here,  we  cannot  say  for  certain  if  all  of  the 
injected  CD  Id  MaSCs  would  give  rise  to  both  compartments 
when  allowed  to  repopulate  the  gland  and  if  they  are  them¬ 
selves  the  precursors  to  the  cells  that  are  largely  responsible  for 
tissue  maintenance. 

Identifying  the  genes  involved  in  maintaining  a  stem  cell  and 
their  self-renewing  capabilities  is  vital  to  furthering  our  un¬ 
derstanding  of  how  these  genes  might  be  involved  in  abnormal 
gland  development  and  tumorigenesis.  Our  current  knowledge 
on  this  hypothesis,  however,  is,  at  best,  limited;  until  now,  it  has 
been  difficult  to  segregate  myoepithelial  cells  and  MaSCs,  be¬ 
cause  they  share  common  cell  surface  markers  and  very  similar 
gene  expression  profiles  (1,  24).  The  large  number  of  shared 
genes  expressed  among  cells  identified  by  standard  markers 
would  mask  any  true  differential  patterns  expressed  by  those 
cells  with  self-renewing  properties.  CDld  MaSCs  cells  only  be¬ 
come  divergent  when  the  expression  patterns  of  a  relatively  small 
number  of  genes  are  considered,  a  fact  that  would  be  overlooked 
if  not  using  a  more  refined  selection  process.  In  addition  to 
improving  gene  profiling  as  a  whole  for  this  minority  population, 
the  use  of  CDld  to  isolate  single  cells  for  profiling  could  provide 
clues  to  gene  expression  changes  between  hypothesized  MaSC 
states.  For  example,  the  complete  loss  of  label-retaining  cells 
after  pregnancy  suggests  that  these  cells  have  undergone  a  more 
extensive  process  of  cell  division  than  in  a  virgin  gland.  However, 
CDld”^  cells  are  still  present,  unaltered  to  some  extent,  and  using 
this  marker,  it  would  be  possible  to  monitor  gene  expression 
changes  during  pregnancy  and  involution. 

It  has  been  suggested  that  stem  cells  within  the  mammary  gland 
contribute  in  some  way  to  the  proposed  notion  of  a  cancer  stem 
cell.  In  mouse  mammary  tumor  virus  (MMTV)-Wntl  and  p53“^“ 
mice,  for  example,  a  preneoplastic  mammary  gland  was  seen  to 
have  an  increased  number  of  functional  MaSCs,  and  ectopic  ex¬ 
pression  of  wnt-1  enhanced  the  self-renewing  capabilities  of  cells, 
leading  to  cancers  (24).  CDld  itself  has  even  been  linked  to  breast 
cancer.  In  one  study,  antibodies  against  CDld,  combined  with  anti 
death  receptor  5  (anti-DR5),  a  TNF-related  apoptosis  inducing 
ligand  (TRAIL)  receptor,  led  to  rejection  of  tumor  growth  after 
injection  of  4T1  tumor  cells,  a  mouse  breast  cancer  cell  line  (25), 
into  a  syngeneic  mouse  fat  pad  (26).  Whether  this  observation  was 
a  result  of  the  proposed  interaction  with  natural  killer  cells  or 
a  disruption  of  the  ability  of  the  cancer  to  self-renew  remains  to  be 
seen.  The  latter  could  be  possible,  because  we  show  that  the  4T1 
mouse  breast  cancer  cell  line  (Fig.  S4C)  and  primary  mouse  breast 
cells  (27)  (Fig.  S4D)  display  a  population  of  cells  that  is  positive 
for  Cdld.  In  humans  as  well,  CDld  plays  some  unknown  role. 
Down-regulation  of  CDld  expression  has  been  shown  to  correlate 
with  increasing  metastasis  in  a  mouse  breast  cancer  model  (28) 
and  disease  progression  in  multiple  myeloma  (29).  Our  own 
studies  have  shown  that  CD  Id  is  expressed  as  a  cell  surface 
marker  in  some  but  not  all  of  the  human  breast  cancer  cell  lines 
tested  (Fig.  S4E).  Those  cell  lines  that  showed  CDld”^  cells  were 
from  basal-like  breast  cancers;  luminal-like  cancer  cell  lines, 
however,  showed  no  CDld”^  cells. 

With  the  ability  to  now  purify  a  more  homogeneous  self- 
renewing  population  of  MaSCs,  it  is  possible  to  delve  more  deeply 
into  the  biology  of  these  cells.  We  have  not  only  appointed  CD  Id 
as  a  marker  of  MaSCs  but  also  used  this  information  to  draw  out 
gene  targets  for  disrupting  mammary  gland  development  and 
possibly,  malignancies.  It  is  also  unknown  yet  if  these  specific  cell 
markers  are  a  cause  or  effect  of  the  ability  of  the  cell  to  retain 
sternness;  interrupting  their  expression  and  studying  the  effect  on 
gland  development  and  cancer  are  critical  topics  of  future  study. 

Materials  and  Methods 

Mice.  K5tTa-H2b-GFP  heterozygote  mice  (6)  were  bred,  and  20-d-old  pups 
were  checked  for  GFP  expression  using  the  IVIS100  in  vivo  imaging  system 


(Caliper).  CD-I  female  mice  were  purchased  from  Charles  River.  Basal-like 
mouse  mammary  gland  tumors  were  obtained  from  the  transgenic  mouse 
mammary  tumor  model  C3-tag  (27)  (a  gift  from  Mikala  Egeblad,  Cold  Spring 
Harbor  Laboratory,  New  York).  All  experiments  were  performed  in  agree¬ 
ment  with  and  approved  by  the  Cold  Spring  Harbor  Laboratory  Institutional 
Animal  Care  and  Use  Committee. 

Two-Photon  Microscopy.  Mammary  glands  were  harvested  and  defatted  by 
three  rounds  of  acetone  treatment  (20  min  each).  Defatted  mammary  glands 
were  embedded  and  imaged  according  to  previously  published  methods  (30). 

In  short,  experiments  were  performed  on  a  high-speed  multiphoton  micro¬ 
scope  with  integrated  vibratome  sectioning  (TissueCyte  1000;  TissueVision, 

Inc.).  3D  scanning  of  5-mm  Z-volume  stacks  was  achieved  with  a  microscope 
objective  piezo  (PI  E-665  LVPZT  amplifier  and  P-725  PIFOC  long-travel  ob¬ 
jective  scanner),  which  translated  the  microscope  objective  with  respect  to 
the  sample.  Each  optical  section  was  imaged  as  a  mosaic  of  individual  fields 
of  view  equal  to  0.83  x  0.83  mm  and  reconstructed  posthoc  using  Fiji  and 
custom-written  Matlab  software. 

Antibodies.  Antibodies  for  flow  cytometry  were  purchased  from  eBioscience 
unless  otherwise  specified,  and  they  include  anti-CD24  eFIuor®  450,  bio¬ 
tinylated  and  PE-conjugated  anti-CD45,  biotinylated  and  phycoerythrin 
(PE)-conjugated  anti-CD31,  biotinylated  and  PE-conjugated  anti-Ter119,  PE- 
Cy7-conjugated  anti-CD29,  FITC-  and  PE-conjugated  anti-CD61,  antigen- 
presenting  cell-conjugated  anti-CD133,  PerCP-CY5.5-  and  PE-conjugated 
anti-CdId  (clones  1B1  and  K253,  respectively;  BioLegend),  PE-conjugated 
anti-CD22,  monoclonal  CD59a  (Hycult  Biotech),  PE-conjugated  human  anti- 
Cdld,  7-AAD  viability  staining  solution  (BioLegend),  FITC-conjugated  mouse  o 
IgG,  and  PE-conjugated  rabbit  IgG.  Antibodies  for  immunostaining  were  “ 
chicken  anti-GFP  (Invitrogen),  mouse  monoclonal  Cytokeratin  18  (SCTB),  anti-  J|j 
chicken-IgG-Alexa  Fluor  647  (Invitrogen),  and  anti-mouse-IgG  Alexa  Fluor  568 
(Invitrogen). 

Mammary  Gland  Preparation.  Mammary  glands  were  harvested  from  young 
female  mice  (6-10  wk)  and  dissociated  according  to  previously  published 
protocol  (1).  After  dissociation,  cells  were  resuspended  in  1  mL  MACS  Buffer 
(Myltenyi  Biotech)  and  incubated  with  biotinylated  anti-CD45,  anti-Ter119, 
and  anti-CD31  antibodies  for  20  min.  Cells  were  washed  with  10  volumes 
MACS  Buffer  and  further  incubated  with  antibiotin  magnetic  microbeads 
(Myltenyi  Biotech).  Labeled  cells  were  loaded  into  a  magnetic  column  at¬ 
tached  to  a  magnetic  field  (Myltenyi  Biotech),  and  lineage-depleted  flow¬ 
through  cells  were  collected  and  further  stained. 

Flow  Cytometry.  Cells  were  stained  for  30  min  at  4  °C  with  antibody  mix  in  PBS 
supplemented  with  1  %  (vol/vol)  FBS  followed  by  wash  with  lOx  volume  PBS. 

Cells  were  resuspended  in  PBS  plus  1%  (vol/vol)  FBS  and  further  stained  with 
7-AAD  immediately  before  sorting  or  analysis.  Cells  were  sorted  using  a  FACS 
ARIAII  SORP  (BD  Bioscience).  For  cell  analysis,  LRSII  (BD  Bioscience)  cell  an¬ 
alyzer  or  MACSQuant  (Myltenyi  Biotech)  were  used.  Data  analysis  was  per¬ 
formed  using  either  FloJo  (Tree  Star)  or  Diva  (BD  Bioscience). 

Matrigel  Colony  Assay.  Cells  were  sorted  into  96-well  plates  containing  100  |iL 
chilled  50%  (vol/vol)  Matrigel  Matrix  (BD  Bioscience),  further  transferred  to 
100%  Matrigel  Precoated  Chamber  Slides  (Lab-Tek),  and  incubated  at  37  °C 
for  5  min.  Complete  Growth  Media  (1)  was  added  to  the  chamber  and 
renewed  every  other  day  for  10  d.  Colonies  were  counted  using  Nikon 
Eclipse  T1  microscope  (Nikon). 

Mammary  Gland  Transplant.  Cells  were  sorted  into  96-well  plates  containing 
30  \il  50%  (vol/vol)  Growth  Factor  Reduced  Matrigel  (BD  Bioscience)  and 
0.01%  (vol/vol)  Tripan  Blue  (Sigma).  Cells  were  injected  into  inguinal  glands 
of  3-wk-old  females  that  had  been  cleared  of  endogenous  epithelium.  Re¬ 
cipient  glands  were  removed  for  evaluation  8-12  wk  after  cell  injection. 

Mammary  Transplant  Analysis.  Frozen  sections  and/or  agarose-embedded 
sections  were  fixed  with  4%  paraformaldehyde  (Sigma)  for  20  min  followed 
by  tissue  permeabilization  and  blocking  using  10%  (vol/vol)  goat  serum 
(Sigma).  Paraffin-embedded  sections  were  dewaxed  and  subjected  to  antigen 
retrieval  for  15  min  in  Trilogy  buffer  (Cell  Marque)  before  blocking  as  de¬ 
scribed  above.  Primary  antibody  staining  was  performed  overnight  at  4  °C 
with  constant  agitation  followed  by  three  washes  with  0.1  %  (vol/vol)  Tween 
20.  Secondary  antibody  staining  was  carried  out  for  45  min  at  room  tem¬ 
perature  with  constant  agitation  followed  by  three  washes  with  0.1%  (vol/ 
vol)  Tween  20.  Slides  were  mounted  with  ProLong  Gold  supplemented  with 
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DAPI  (Invitrogen).  For  immunohistochemistry  detection  of  GFP-positive 
outgrowths,  the  kit  Ace  IHC  Detection  Kit  (Epitomics)  was  used  according  to 
the  manufacturer's  instructions.  Tissue  sections  were  analyzed  using  either 
the  Nikon  Eclipse  T1  microscope  (Nikon)  or  Zeiss  LSM  710  confocal  micro¬ 
scope.  For  whole-mount  images,  glands  were  harvested,  spread  atop  a  glass 
slide,  defatted,  and  stained  with  Carmine  Aluminum  solution  prior  image 
analysis.  MRU  frequency  was  estimated  using  the  ELDA  algorithm  (10). 
Mammary  gland  reconstitution  was  considered  successful  if,  by  the  time 
of  analysis,  at  least  one-third  of  the  fat  pad  was  repopulated  with  GFP^ 
structures. 

RNAseq  Library  Preparation.  Cells  were  sorted  into  Eppendorf  tubes  filled  with 
TRIzol  LS  (Invitrogen),  and  RNA  purification  was  performed  according  to 
the  manufacturer's  instructions.  DNase-free  RNA  samples  were  used  for  the 
preparation  of  double-strand  cDNA  libraries  using  the  Version  1  Ovation 
RNAseq  System  (Nugen).  cDNA  libraries  were  phosphorylated,  adenylated, 
and  ligated  to  lllumina  adapters  followed  by  PCR  enrichment.  Single-ended 
sequencing  was  performed  for  36  cycles  in  lllumina  GAN  instruments  (lllumina). 

RNAseq  Mapping  and  Analysis.  We  used  the  Refseq  transcriptome  (mm9 
mouse  assembly)  downloaded  through  the  University  of  California,  Santa 
Cruz  (uses)  Table  Browser  (31).  Reads  were  mapped  in  two  stages:  first,  they 
were  mapped  to  sequences  constructed  using  all  annotated  Refseq  exons 
with  overlapping  exons  collapsed,  and  second,  they  were  mapped  to  all 
possible  junctions  formed  from  all  pairs  of  exons  for  the  same  gene.  Map¬ 
ping  was  done  with  RMAP  (32)  and  allowed  up  to  three  mismatches  in  36 
bases.  Reads  mapping  ambiguously  (including  mapping  to  an  exon  and 
a  junction)  were  discarded.  For  each  Refseq  transcript,  we  counted  the 
number  of  reads  with  mapping  location  that  was  inside  the  transcript's 
exons  (allowing  a  given  read  to  be  counted  for  two  distinct  transcripts  as 
long  as  the  location  is  unique)  or  through  one  of  the  transcript's  junctions. 
Reads  per  kilobase  per  million  (RPKM)  calculations  discarded  duplicate  reads 
and  corrected  gene  size  for  the  portion  of  the  gene  that  cannot  be  uniquely 
mapped.  Differential  expression  between  two  RNAseq  experiments  was 
computed  using  a  2  x  2  contingency  table  and  either  a  statistic  or  Fisher 
exact  test  to  obtain  a  P  value  for  differential  expression.  Briefly,  the  con¬ 
tingency  tables  contained,  for  each  gene,  the  counts  of  reads  mapping  into 
the  gene  and  the  counts  of  reads  mapping  outside  the  gene  for  both 
experiments.  The  P  values  were  corrected  for  multiple  testing  using  the 
Bonferroni  correction.  The  genes  that  remained  were  called  as  differentially 


expressed  (corrected  P  >  0.01),  and  rankings  for  differentially  expressed 
genes  were  based  on  ratios  of  RPKM  values. 

Quantitative  PCR.  Cells  were  sorted  into  96-well  plates  containing  30  |iL  Cell- 
To-Ct  lysis  buffer  (Ambion).  cDNA  synthesis  was  performed  according  to  the 
manufacturer's  instruction.  Real-time  PCR  was  performed  using  specific 
Taqman  probes  (Applied  Biosystems)  for  each  gene  and  Gapdh  mRNA  as  an 
endogenous  control.  Samples  were  run  on  a  7900  Real-Time  PCR  System 
(Applied  Biosystems). 

BrdU  Experiment.  BrdU  label-retaining  experiments  were  performed  using  the 
BrdU-APC  Flow  Kit  (BD  Bioscience);  3-wk-old  female  mice  were  injected  with 
BrdU  (one  time  per  day  for  5  consecutive  d,  50  mg/kg  body  weight),  and 
mammary  glands  were  harvest  at  specified  time  points.  Mammary  gland  cells 
were  prepared  according  to  the  BrdU  manufacturer's  recommendations. 
Cells  were  analyzed  with  an  LSRII  Cell  Analyzer  (BD  Bioscience),  and  1  million 
cells  were  recorded  per  sample.  For  each  experiment  (n  =  2),  three  glands 
were  analyzed  at  week  0  (last  day  of  BrdU  injection),  and  three  glands  were 
analyzed  at  week  12  after  BrdU  injection. 

Knockdown  Experiment.  shRNAs  against  56  selected  genes  were  pulled  and 
transferred  from  pG I Pz  (LMN  vector)  lentiviral  backbone  (Open  Biosystems)  to 
MSCV-miR30-PGK-NEO-IRES-GFP  retroviral  backbone  (a  gift  from  Christopher 
R.  Vakoc,  Cold  Spring  Harbor  Laboratory,  New  York).  Plasmid  was  transfected 
into  Plat-E  cells  (33)  using  Lipofectamine  2000  and  vesicular  stomatitis  virus 
g-protein  (VSVG),  and  virus  was  collected  24  and  36  h  posttransfection.  Cells 
were  infected  by  spin  infection  and  allowed  2  d  for  recovery.  GFP  levels  were 
measured  using  MACSQuant  Cell  analyzer  (Miltenyi  Biotech)  from  10,000 
cells.  Hairpins  used  on  validation  experiments  were  ordered  as  oligonucleo¬ 
tides  from  Integrated  DNA  Technologies  (IDT)  and  used  as  the  template  for 
PCR  reactions  using  KOD  hot-start  polymerase  (EMD  Milipore)  and  the  primers 
5MIR  (5'-CAGAAGGCTCGAGAAGGTATATTGCTGTTGACAGTGAGCG-3')  and 
3MIR  (5'-CTAAAGTAGCCCCTTGAATTCCGAGGCAGTAGGCA-3').  PCR  products 
were  column-purified  (Qiagen),  digested  with  EcoRI  and  Xhol  enzymes,  and 
cloned  into  predigested  LMN  vectors  using  T4  rapid  ligase  (Promega). 
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Fig.  S1.  Characterization  of  H2b-GFP  mammary  gland  label-retaining  cells.  Effects  of  doxycycline  diet  on  K5tTa-H2b-GFP  transgenic  mouse  mammary  glands. 
(/A)  Paraffin-embedded  sections  with  DAPI  nuclear  staining  and  anti-GFP  antibody.  {B)  Lineage  depletion  strategy.  FACS  analysis  showing  removal  of  PE-stained 
red  blood  cells  (Ter119^  cells),  white  blood  cells  (CD45^  cells),  and  endothelial  cells  (CD31^  cells)  after  magnetic  bead  lineage  depletion.  (C)  H2b-GFP+  cells 
gating  strategy.  Lin“  mammary  gland  cells  were  first  selected  according  to  GFP  expression  (GFP“  and  GFP^)  and  further  analyzed  according  to  anti-CD24  and 
anti-CD29  staining  as  displayed  in  Fig.  2.  (D)  FACS  sorting  strategy  for  transplantation  assays.  Lin“  GFP  chase  cells,  stained  with  7-ADD  for  dead  cell  exclusion, 
were  divided  based  on  GFP  expression,  H2b-GFP“  mammary  gland  stem  cells  (MaSCs;  CD24^CD29'^GFP“),  and  H2b-GFP'^  MaSCs  (CD24^CD29'^GFP'^)  and  either 
transplanted  into  cleared  fat  pads  of  prepubescent  female  mice  or  (E)  carried  through  to  colony-forming  assays. 
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Fig.  S2.  Quantitative  RT-PCR  validation  of  mammary  gland  transcriptome.  Lin“  mammary  gland  cells  (n  =  10)  were  sorted  into  lyses  buffer  for  quantitative 
PCR  using  Taqman  probes.  Cd24  and  Cd29  mRNAs  were  included  as  controls.  Samples  were  normalized  to  the  levels  of  Gapdh  expression.  Error  bars  represent 
SD  among  replicated.  *P  <  0.05  by  the  t  test. 
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Fig.  S3.  Identification  of  MaSC  cell  surface  markers.  {A)  Heat  map  of  cell  surface  markers  expression  across  all  mammary  gland  cell  types  profiled.  Those  cell 
surface  markers  shown  are  the  most  abundantly  expressed  within  the  H2b-GFP'^  MaSCs.  {B)  MaSC  markers  colony-forming  assay.  Cells  were  sorted  using  Cd24^ 
Cd29'^  alone  (total  MaSC)  or  Cd24+Cd29'^  plus  one  of  three  markers:  CDId  (CDId  MaSC),  CD59a  (CD59a'^  MaSC),  or  CD22  (CD22  MaSC).  *Two  hundred  cells.  (C) 
Expression  dendogram  for  the  top  most  abundantly  expressed  genes  across  all  mammary  gland  cells,  including  CDId  MaSCs  and  CD59a'^  MaSCs.  (D)  Repre¬ 
sentative  whole-mount  images  from  mammary  glands  injected  with  Cdld'^  MaSCs.  *Scar  tissue  from  cell  injection.  (E)  Representative  images  from  paraffin- 
embedded  sections  of  Cdld^  MaSC-injected  glands  stained  with  H&E  and  anti-GFP  immunohistochemistry  (IHC)  {Left  Inset).  (Scale  bar:  2  mm;  Left  Inset,  100  iim.) 
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Fig.  S4.  Mammary  gland-focused  screen.  {A)  Screen  strategy  scheme.  Plat-E  cells  were  transfected  with  shRNAs  as  described  in  Materials  and  Methods. 
Comma-Dp  cells  were  infected  with  the  virus  supernatant  for  20  h;  48  h  postinfection,  GFP  percent  was  quantified  using  the  MACSQuant  Cell  Analyzer  (Miltenyi 
Biotech).  TO  represents  the  GFP  percent  on  day  2  postinfection,  and  T3  represents  the  GFP  percent  on  day  12  d  postinfection.  CDId^  cells  in  mouse  cell  lines  {B) 
Comma-Dp  cells,  (C)  4T1  mouse  breast  cancer  cells,  and  (D)  C3-tag  breast  cancer  model  primary  cells.  (£)  CDId^  in  the  human  cell  line  MDA-MB-468. 


Dataset  S1.  Mammary  reconstitution  unit  (MRU)  frequency  in  H2b-GFP^  MaSC 

Dataset  S1 

Reconstituted  mammary  glands  harvested  12  wk  postinjection  of  either  H2b-GFP'^  MaSCs  or  H2b-GFP“  MaSCs.  A  minimum  of  25  outgrowths  is  required  to  be 
considered  a  reconstituted  gland.  MRU  frequency  was  estimated  using  the  ELDA  algorithm. 


Dataset  S2.  Mammary  gland  pathway  analysis 

Dataset  52 


Top  differentially  expressed  genes  of 
Pathways  Analysis  (Ingenuity  Systems).  A 


all  mammary  gland  cell  types  were  analyzed  for  pathway  enrichment  and  molecular  functions  using  Ingenuity 
minimum  of  50  genes  per  cell  type  was  analyzed. 
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Dataset  S3.  MRU  frequency  in  CD1d+  MaSC 


Dataset  S3 


Total  MaSC  cells  (CD24+CD29'^)  and  CDId  MaSC  cells  (CD24+CD29'^CD1d^)  were  isolated  from  the  H2b-GFP  transgenic  mouse  off  doxycycline  diet  (GFP  pulse). 
Reconstituted  mammary  glands  harvested  12  wk  postcell  injection.  A  minimum  of  25  outgrowths  is  required  to  be  considered  a  reconstituted  gland.  MRU 
frequency  was  estimated  using  the  ELDA  algorithm. 
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